Automatically Tracking Metadata and Provenance of Machine Learning Experiments

نویسندگان

Sebastian Schelter

Joos-Hendrik Böse

Johannes Kirschnick

Thoralf Klein

Stephan Seufert

چکیده

We present a lightweight system to extract, store and manage metadata and provenance information of common artifacts in machine learning (ML) experiments: datasets, models, predictions, evaluations and training runs. Our system accelerates users in their ML workflow, and provides a basis for comparability and repeatability of ML experiments. We achieve this by tracking the lineage of produced artifacts and automatically extracting metadata such as hyperparameters of models, schemas of datasets or layouts of deep neural networks. Our system provides a general declarative representation of said ML artifacts, is integrated with popular frameworks such as MXNet, SparkML and scikit-learn, and meets the demands of various production use cases at Amazon.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Augmenting geospatial data provenance through metadata tracking in geospatial service chaining

In a service-oriented environment, heterogeneous data from distributed data archiving centers and various geo-processing services are chained together dynamically to generate on-demand data products. Creating an executable service chain requires detailed specification of metadata for data sets and service instances. Using metadata tracking, semantics-enabled metadata are generated and propagate...

متن کامل

Declarative Model Discovery in Provenance Data for Aiding in Scientific Experiment Planning

Data provenance manages a collection of metadata cataloging origin and history of data. In scientific workflows, this metadata supports scientific experiment planning. However, the amount of provenance data generated from scientific workflow executions can grow through time, becoming infeasible evaluate them manually. Thus, mechanisms for automatically extracting and presenting knowledge from p...

متن کامل

Using Provenance to Extract Semantic File Attributes

Rich, semantically descriptive file attributes are valuable in many contexts, such as semantic namespaces and desktop search. Descriptive attributes help users to find files placed in seemingly-arbitrary locations by different applications. However, extracting semantic attributes from file contents is nontrivial. An alternative is to examine file provenance: how and when files are used, and the...

متن کامل

Using Provenance for Personalized Quality Ranking of Scientific Datasets

The rapid growth of eScience has led to an explosion in the creation and availability of scientific datasets that includes raw instrument data and derived datasets from model simulations. A large number of these datasets are surfacing online in public and private catalogs, often annotated with XML metadata, as part of community efforts to foster open research. With this rapid expansion comes th...

متن کامل

A Foundational Ontology to Support Scientific Experiments

Provenance is a term used to describe the history, lineage or origins of a piece of data. In scientific experiments that are computationally intensive the data resources are produced in large-scale. Thus, as more scientific data are produced the importance of tracking and sharing its metadata grows. Therefore, it is desirable to make it easy to access, share, reuse, integrate and reason. To add...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Automatically Tracking Metadata and Provenance of Machine Learning Experiments

نویسندگان

چکیده

منابع مشابه

Augmenting geospatial data provenance through metadata tracking in geospatial service chaining

Declarative Model Discovery in Provenance Data for Aiding in Scientific Experiment Planning

Using Provenance to Extract Semantic File Attributes

Using Provenance for Personalized Quality Ranking of Scientific Datasets

A Foundational Ontology to Support Scientific Experiments

عنوان ژورنال:

اشتراک گذاری